After examination of the source code a bug has been discovered that caused the trouble with the evaluation of the distance to distribution. After the bug has been fixed it was decided to test different neural network configurations:
Each configuration has 2 hidden layers with 20 neurons.
The tests has been performed on 3 different types of datasets which were produced using generated traces with 2 types of population.
Case 1: single population.
Arrivals: Poisson with 20.0 mean time between arrivals.
Popularity: Zipf with 0.8 parameter.
Number of items: 100 000.
Case 2: mixed 2 populations.
Arrivals: Poisson with 40.0 mean time between arrivals for both of populations.
Popularity: Zipf with 0.8 parameter for both populations, but for second population the IDs are randomly shuffled each time window.
Number of items: 50 000 in each population.
The Dataset 1 is generated using only case 1 population, the Dataset 2 - using case 2 population without keeping class label, the Dataset 3 - using case 2 population keeping class label. Each dataset consists of 6 columns - ID of the object, popularities in 4 previous time windows, popularity in 5-th time window (which NN should try to predict). Dataset 3 additionally contains a column with class label (0 or 1).
During tests it was observed that the neural network performs better training on Dataset 2 rather than on Dataset 3, which should't be the case, since Dataset 3 contains more information. It was decided to transform Dataset 3 in a way that 4 popularity in previous time window columns are transformed into 8 columns. The first 4 columns out of 8 new columns are non-zero and contain popularity values if the item's class label is 0. If the class label is 1 then the last 4 columns are non-zero.
Now let’s present what results has been achieved:
As seen from the plots, this configuration performed reasonably well in terms of item ordering in all cases even though the predicted popularities can be far from real popularities.
This configuration also performed reasonably well with item ordering, but popularity prediction has less variability and still is not very accurate.
But the last configuration ordered the items with all 3 datasets in reverse order. Also the predicted popularity is almost the same for each item and close to the average popularity - 1e-5.
After running the learning again for a few times the neural networks were able to order the items correctly, but the predicted popularity behaviour is the same.
| Distance to distribution | Ordering |
|---|---|
| Dataset 1. Distance to distribution. | Dataset 1. Ordering. |
| Dataset 2. Distance to distribution. | Dataset 2. Ordering. |
| Dataset 3. Distance to distribution. | Dataset 3. Ordering. |
Atempts to change the number of layers, number of neurons in hidden layers, applying different learning rates produced the same behaviour.